Applying Software Analysis Technology to Lightweight Semantic Markup of Document Text

نویسندگان

  • Nadzeya Kiyavitskaya
  • Nicola Zeni
  • James R. Cordy
  • Luisa Mich
  • John Mylopoulos
چکیده

Software analysis techniques, and in particular software “design recovery”, have been highly successful at both technical and businesslevel semantic markup of large scale software systems written in a wide variety of programming languages, and in particular have proven efficient and scalable in assisting the resolution of the “year 2000” problem for billions of lines of legacy source code. In this work we describe a first experiment in applying the same technical solutions and tools that have proven so successful in software markup to the more general problem of semantic markup of text documents. In this early report we describe our adaptation of the software analysis techniques, propose a general domain-independent architecture for semantic markup using them, and demonstrate its feasibility in a limited but realistic domain of application by comparison with both raw and tool-assisted human semantic markers.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Joint Semantic Vector Representation Model for Text Clustering and Classification

Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...

متن کامل

Exploiting Latent Semantic Relations in Highly Linked Hypertext for Information Retrieval in Wikis

Good hypertext writing style mandates that link texts clearly indicate the nature of the link target. While this guideline is routinely ignored in HTML, the lightweight markup languages used by wikis encourage or even force hypertext authors to use semantically appropriate link texts. This property of wiki hypertext makes it an ideal candidate for processing with latent semantic analysis, a fac...

متن کامل

Cerno: Light-weight tool support for semantic annotation of textual documents

Enrichment of text documents with semantic metadata reflecting their meaning facilitates document organization, indexing and retrieval. However, most web data remain unstructured because of the difficulty and the cost of manually annotating text. In this work, we present Cerno, a framework for semi-automatic semantic annotation of textual documents according to a domain-specific semantic model....

متن کامل

بررسی کاربرد فناوری معنایی برای سازماندهی اطلاعات در نرم‌افزارهای کتابخانه دیجیتالی

The present study was an attempt to investigate the use of semantic technologies to organize information in digital library software systems. The present study was a practical one which employed a descriptive survey method. The study sample consisted of three digital library software systems entitled Pars Azarakhsh, Parvan Pajoh, and Payam Mashregh. Data were collected through a checklist incl...

متن کامل

Linguistic Annotation for the Semantic Web

Establishing the semantic web on a large scale implies the widespread annotation of web documents with ontology-based knowledge markup. For this purpose, tools have been developed that allow for semi-automatic annotation of web documents with ontology-based metadata. However, given that a large number of web documents consist either fully or at least partially of free text, language technology ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005